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Abstract 

We  show  the  advantage  of  Quarternary  Decision  Dia¬ 
grams  (QDDs)  in  representing  and  evaluating  logic  func¬ 
tions.  That  is,  we  show  how  QDDs  are  used  to  implement 
QDD  machines,  which  yield  high-speed  implementations. 
We  compare  QDD  machines  with  binary  decision  diagram 
(BDD)  machines,  and  show  a  speed  improvement  of  1.28- 
2.02  times  when  QDDs  are  chosen.  We  consider  1-and  2- 
address  BDD  machines,  and  3-  and  4-address  QDD  ma¬ 
chines,  and  we  show  a  method  to  minimize  the  number  of 
instructions. 


1  Introduction 

Branching  program  machines  for  BDDs  have  been  used 
in  control  applications  [2,  5,  7,  6].  Fast  response  is  espe¬ 
cially  important  in  control  applications  in  which  there  are 
usually  hundreds  of  inputs.  For  such  applications,  a  gen¬ 
eral  purpose  microprocessor  (MPU)  cannot  meet  the  speed 
requirements.  A  branching  program  machine  can  be  sev¬ 
eral  times  faster  than  an  MPU:  An  ordinary  MPU  requires 
two  or  three  machine  instructions  to  read  and  test  one  in¬ 
put  variable,  while  the  branching  program  machine  requires 
just  one  instruction  [3]. 

In  this  paper,  we  present  a  Quarternary  Decision  Dia¬ 
gram  (QDD)  to  implement  a  branching  program  machine. 
Although  the  QDD  machine  requires  longer  instruction 
words  than  the  BDD  machine,  the  QDD  machine  is  1.3— 2.0 
times  faster  than  the  corresponding  BDD  machine.  In  the 
past,  when  the  price  of  memory  was  high,  16-bit  controllers 
were  popular  [13,  25].  However,  nowadays,  the  price  of 
memory  is  lower,  and  a  32-bit  or  wider  architecture  is  often 
used  to  increase  the  performance  of  controllers.  So,  in  this 
paper,  we  show  a  method  to  increase  the  performance  by 
increasing  the  number  of  bits  in  a  word. 

The  rest  of  this  paper  is  organized  as  follows:  Section 


2  introduces  a  method  to  represent  multi-output  logic  func¬ 
tions  by  multi-valued  decision  diagrams.  Section  3  intro¬ 
duces  branching  program  machines:  It  introduces  both  a  4- 
address  QDD  machine  and  a  3-address  QDD  machine.  The 
3-address  QDD  machine  requires  less  memory  than  the  4- 
address  QDD  machine.  Section  4  shows  an  optimization 
problem  of  codes  for  3-address  QDD  machines.  Section  5 
shows  the  experimental  results.  And  finally,  Section  6  con¬ 
cludes  the  paper. 

2  Representation  of  Multiple-Output  Func¬ 
tions 

2.1  Multi-Valued  Decision  Diagrams 

An  arbitrary  n  variable  logic  function  can  be  represented 
by  a  binary  decision  diagram  (BDD).  Evaluation  of  a  BDD 
requires  n  table  look-ups.  Fig.  2.1  shows  an  example  of  an 
MTBDD  (multi- terminal  binary  decision  diagram).  In  this 
case,  many  outputs  can  be  evaluated  at  the  same  time.  To 
further  speed  up  the  evaluation,  a  multiple- valued  decision 
diagram  (MDD)  is  used.  In  the  MDD (k),  k  variables  are 
grouped  to  form  a  2k -valued  super  variable.  To  evaluate 
the  MDD(/c),  we  need  at  most  [ table  look-ups  [15,  19]. 
When  the  function  is  represented  by  an  MDD(/c),  the  evalu¬ 
ation  of  a  logic  function  can  be  k  times  faster  than  the  corre¬ 
sponding  BDD  1 .  Thus,  a  larger  k  yields  a  faster  evaluation 
of  the  MDD(/c).  Unfortunately,  the  size  of  memory  to  repre¬ 
sent  a  node  for  an  MDD(/c)  is  proportional  to  2k ,  as  shown 
in  Fig.  2.2.  For  many  benchmark  functions,  the  total  size 
of  the  memory  for  an  MDD(/c)  achieves  its  minimum  when 
k  =  2  [19].  Therefore,  in  logic  evaluation,  MDD(2)s  are 
more  suitable  than  BDDs.  Since  nodes  in  an  MDD(2)  have 
4  branches,  it  is  termed  a  Quarternary  Decision  Diagram 
(QDD). 

'This  is  true  only  when  the  MDD(k)  and  the  BDD  are  quasi  reduced. 
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Figure  2.3.  Conversion  of  BDD  to  MDD(2). 
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Figure  2.2.  Nodes  for  MDD(k). 


2.2  Optimization  of  MDDs 

In  an  MDD (k),  the  evaluation  of  an  n-variable  logic 
function  can  be  done  by  at  most  \^~\  table  look-ups.  So,  the 
major  problem  is  the  minimization  of  the  number  of  nodes. 
In  general,  it  is  not  so  easy  to  obtain  an  MDD (k)  with  the 
minimum  number  of  nodes.  The  following  heuristic  method 
is  used  to  obtain  near  minimal  MDDs: 

1 .  Minimize  the  number  of  nodes  of  the  BDD  by  a  heuris¬ 
tic  method  [21]. 

2.  Partition  the  input  variables  to  generate  an  MDD(fc) 

[22]. 

Fig.  2.3  shows  an  example  of  a  conversion  from  a  BDD  into 
an  MDD(2).  In  the  above  MDDs,  we  assume  each  group  of 
variables  has  the  same  size.  Such  MDDs  are  homogeneous 
MDDs.  When  the  groups  have  different  sizes,  the  MDD 
is  a  heterogeneous  MDD.  For  simplicity,  in  this  paper,  we 
consider  only  homogeneous  MDDs. 

3  Branching  Program  Machine 


chitecture  is  well-suited  for  evaluating  MDDs,  but  is  easily 
programmed. 

3.1  2- Address  BDD  Machine 

A  branching  program  for  BDDs  uses  only  two  kinds  of 
instructions: 

B_B ranch  ( ADDRO ,  ADDR1),  INDEX 
Output  DATA,  and  GOTO  ADDR. 

The  first  one  is  the  binary  branch  instruction  that  is  sim¬ 
ilar  to  the  computed  GOTO  statement  of  the  FORTRAN 
language:  If  the  value  of  the  variable  specified  by  INDEX 
is  equal  to  0,  then  go  to  ADDRO,  otherwise  goto  ADDR1. 
The  second  one  performs  the  output  operation  followed  by 
an  unconditional  GOTO  operation. 

Example  3.1  Consider  the  MTBDD  shown  in  Fig.  2.1.  The 
following  code  evaluates  the  MTBDD: 

NO : B_B ranch (N2 , N1 ) ,  XI 
Nl : B_Branch (N2 , T4 ) ,  X2 
N2 :B_Branch (N3,N4) ,  X3 
N3 : B_Branch (TO, Tl) ,  X4 
N4 : B_Branch (T2, T3) ,  X4 
TO: Output  0,  and  GOTO  NO 
Tl: Output  9,  and  GOTO  NO 
T2 : Output  10,  and  GOTO  NO 
T3 : Output  11,  and  GOTO  NO 
T4: Output  15,  and  GOTO  NO 


Special  machines  to  evaluate  MDDs  have  been  devel¬ 
oped  [8,  9,  10].  Unfortunately,  they  are  unsuitable  for  prac¬ 
tical  applications.  Here,  we  consider  a  machine  whose  ar¬ 
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In  this  example,  DATA  in  Output  DATA  is  the  decimal  equiv¬ 
alent  of  the  function  output  values  expressed  in  binary  as 
h ,  h  ,  h  ,  fo-  ( End  of  Example) 


Input  selector 


Input  selector 


Figure  3.1.  2-address  BDD  Machine. 


Figure  3.2. 1 -address  BDD  Machine. 


Fig.  3.1  shows  the  architecture  of  the  2-address  BDD  ma¬ 
chine,  where  only  the  circuit  for  the  branching  operation  is 
shown.  The  first  field  of  the  branching  instruction  specifies 
the  branch  command.  The  second  field,  INDEX,  specifies 
the  index  i  of  the  input  variables  x-L .  It  determines  which 
variables  to  select.  The  input  selector  in  Fig.  3.1  produces 
the  value  of  the  variable  xi  selecting  the  next  branch  ad¬ 
dress.  When  Xi  =  0,  ADDRO  is  selected.  Otherwise, 
ADDR1  is  selected.  The  selected  address  is  then  loaded 
into  the  program  counter  (PC).  In  this  way,  the  next  address 
is  specified.  To  reduce  the  width  of  the  instruction  words, 
1 -address  BDD  machines  shown  in  Fig.  3.2  have  been  de¬ 
veloped  [2,  6,  25,  13].  In  this  case,  when  the  value  specified 
by  INDEX  is  1 ,  the  machine  works  similarly  to  the  case  of 
the  2-address  BDD  machine.  Otherwise,  the  content  of  the 
program  counter  (PC)  is  incremented  by  one,  to  access  the 
next  address.  In  this  case,  the  size  of  the  instruction  word 
is  reduced,  but  unconditional  GOTO  instructions  are  neces¬ 
sary,  as  shown  later. 

3.2  4-Address  QDD  Machine 

By  evaluating  two  binary  variables  and  by  increasing  the 
number  of  branch  addresses  to  four,  we  have  a  branch  in¬ 
struction  for  a  4-address  QDD  machine.  Since  it  evaluates 
two  binary  variables  at  a  time,  it  can  reduce  the  evaluation 
time  to  half  that  of  the  2-address  BDD  machine. 

A  branching  program  for  4- address  QDD  machines  con¬ 
sists  of  two  kind  of  instructions: 

Q_Branch (ADDRO , ADDR1 , ADDR2 , ADDR3 ) , INDEX 
Output  DATA,  and  GOTO  ADDR 


Fig.  3.4  shows  the  format  for  the  branch  instruction.  Fig.  3.3 
shows  the  architecture  of  the  4-address  QDD  machine, 
where  only  the  circuit  for  the  branching  operation  is  shown. 
The  first  field  of  the  branching  instruction  specifies  the 
branch  command.  The  second  field,  INDEX,  specifies  the 
index  i  of  the  input  variable  Xi.  It  determines  which  vari¬ 
ables  to  select.  In  the  case  of  a  QDD,  two  consecutive  bi¬ 
nary  variables  are  selected  at  a  time.  The  input  selector 
shown  in  Fig.  3.3  producesX^.  The  upper  multiplexer  se¬ 
lects  the  variable.  When  Xt  =  (0,0),  ADDRO  is  selected; 
when  Xi  =  (0, 1),  ADDR1  is  selected;  when  Xi  =  (1, 0), 
ADDR2  is  selected;  and  when  X%  —  (1,1),  ADDR3  is  se¬ 
lected.  The  selected  address  is  then  loaded  into  the  program 
counter  (PC).  In  this  way,  the  next  address  is  specified  as  a 
function  of  INDEX  i  and  the  input  variable  Xz .  Note  that 
this  instruction  requires  a  rather  long  word,  which  would  be 
expensive  for  embedded  applications. 

Fig.  3.5  shows  the  format  for  the  output  instruction.  The 
left  field  specifies  the  instruction  type:  Output.  The  mid¬ 
dle  field  contains  the  address  to  which  this  program  should 
jump.  The  right  field  is  the  output  value,  as  shown  at  the 
bottom  of  the  QDD. 

3.3  3-Address  QDD  Machine 

Since  the  4-address  QDD  instruction  requires  a  long 
word,  we  developed  a  3-address  QDD  machine.  The  branch 
instruction  for  the  3 -address  QDD  machine  contains  only 
three  address  fields.  For  example,  consider  the  instruction 
shown  in  Fig.  3.6.  This  instruction  is  symbolically  denoted 
by 
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Figure  3.4.  Branch  Instruction  for  4-address  Figure  3.7.  3-address  QDD  Machine. 

QDD  Machine. 


Q_Branch ( + 1 , ADDR1 , ADDR2 , ADDR3 ) , INDEX. 

In  this  instruction,  ADDR1,  ADDR2,  and  ADDR3  are  spec¬ 
ified,  but  ADDRO  is  missing.  ADDRO  is  replaced  by  “+1”, 
which  shows  the  next  address  of  the  current  instruction. 
This  instruction  performs  the  following  operations: 


Q_Branch (ADDRO , ADDR1 , ADDR2 , ADDR3 ) , INDEX 

can  be  simulated  by  the  pair  of  instructions: 

Q_Branch (+1, ADDR1, ADDR2, ADDR3) , INDEX 
GOTO  ADDRO 


•  Let  i  be  the  value  specified  by  INDEX.  If  (i  =  0)  then 
goto  the  next  address  of  the  current  instruction,  else 
goto  ADDRi. 

Lemma  3.1  An  arbitrary  QDD  can  be  evaluated  by  a  pro¬ 
gram  consisting  of  the  following  instructions: 

Q_Branch ( +1 , ADDR1 , ADDR2 , ADDR3 ) , INDEX 
GOTO  ADDR 

Output  DATA,  and  GOTO  ADDR 

For  example,  the  instruction  for  the  4-address  QDD  ma¬ 
chine 


Output 


Address 


Output  Values 


Figure  3.5.  Output  Instruction  for  a  QDD  Ma¬ 
chine. 


Note  that  the  last  instruction  is  an  unconditional  GOTO 
statement.  As  shown  in  the  next  section,  the  number  of  un¬ 
conditional  GOTO  statements  can  be  minimized  by  an  opti¬ 
mization  algorithm.  Fig.  3.7  shows  the  architecture  of  the  3- 
address  QDD  machine,  where  only  the  circuit  for  branching 
operations  is  shown.  Consider  the  instruction  in  Fig.  3.6. 
When  the  value  specified  by  INDEX  and  the  input  variables 
is  non-zero,  the  machine  works  similarly  to  the  case  of  the 
4-address  QDD  machine.  When  the  value  specified  by  IN¬ 
DEX  and  the  input  variables  is  equal  to  0,  the  content  of  the 
program  counter  (PC)  is  incremented  by  one,  to  access  the 
next  address. 

In  the  real  system,  we  use  four  types  of  branch  instruc¬ 
tions  shown  in  Fig.  3.8  .  To  distinguish  four  branch  instruc¬ 
tions,  we  use  two  additional  bits  in  the  instruction  field. 
However,  as  shown  in  the  experimental  results,  by  using 
four  branch  instructions,  we  can  reduce  the  number  of  in¬ 
structions  and  the  total  bit  size.  So,  the  cost  of  these  extra 
bits  is  fully  compensated. 
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B ranchO 

Index  ADDR1 

ADDR2 

ADDR3 

B  ranch  1 

Index  ADDRO 

ADDR2 

ADDR3 

Branch2 

Index  ADDRO 

ADDR1 

ADDR3 

Branch3 

Index  ADDRO 

ADDR1 

ADDR2 

Figure  3.8.  Four  Types  of  Branch  Instructions 
for  3-address  QDD  Machine. 


4  Optimization  of  Codes  for  QDD  Machines 


In  this  section,  we  consider  a  method  to  reduce  the  num¬ 
ber  of  instructions  for  QDD  machines. 

Definition  4.1  Given  the  QDD  and  an  order  of  the  input 
variables  (e.g.  X\,X2, . . and  xn),  the  code  size  CSIZE 
is  the  number  of  instructions  needed  to  compute  the  Deci¬ 
sion  diagram  on  a  given  machine.  Let  4aQDDM  denote 
a  4-address  QDD  machine,  and  let  3aQDDM  denote  a  3- 
address  QDD  machine. 

Lemma  4.1  Let  Nn  be  the  number  of  non-terminal  nodes, 
and  let  Nt  be  the  number  of  terminal  nodes  in  a  QDD.  We 
have  the  following  relation: 

CSIZE(4aQDDM )  =  Nn  +  NT.  (4.1) 

(Proof)  In  a  4- address  QDD  machine,  a  non- terminal  node 
is  represented  by  a  branch  instruction,  and  a  terminal  node 
is  represented  by  an  output  instruction.  (Q.E.D.) 

Lemma  4.2  Let  Nn  be  the  number  of  non-terminal  nodes 
and  let  Nt  be  the  number  of  terminal  nodes  in  a  QDD. 
Let  Nu  be  the  number  of  unconditional  GOTO  statements 
that  are  not  part  of  output  statements.  Then,  we  have  the 
following  relations: 

CSIZE(3aQDDM)  =  Njj  +  Nn  +  Nt  (4.2) 
0  <  Nu  <  Nn  (4.3) 

(Proof)  In  a  3 -address  QDD  machine,  a  non- terminal  node 
is  represented  by  either  a  branch  instruction  or  a  pair  con¬ 
sisting  of  a  branch  instruction  and  an  unconditional  GOTO 
statement.  Also,  a  terminal  node  is  represented  by  an  out¬ 
put  instruction.  Thus,  the  number  of  unconditional  GOTO 
statements  is  at  most  the  number  of  non- terminal  nodes. 

(Q.E.D.) 

In  the  case  of  a  4-address  QDD  machine,  there  is  no  code 
optimization  problem,  i.e.,  the  instructions  can  be  generated 
in  any  order.  However,  in  the  case  of  a  3-address  QDD 
machine,  the  length  of  the  program  depends  on  the  order  of 
instructions. 


Figure  4.1.  QDD  for  Example  Function. 


Example  4.1  Consider  the  QDD  shown  in  Fig.  4.1.  It  has 
five  non-terminal  nodes,  and  four  terminal  nodes.  When  the 
code  is  generated  in  the  breadth-first  order,  i.e.,  in  the  order 
ofX  i,  X2  and  X3,  we  have  the  following: 

/**  Code  with  Unconditional  GOTO  **/ 

NO :Q_Branch (+1, Nl, Nl, N1 ) , XI 
Q_Branch ( +1 , N3 , N3 , N3 ) , X2 
GOTO  N2 

Nl : Q_Branch ( +  1 , T3 , T3 ,  T3 )  ,  X2 
GOTO  N3 

N2 : Q_B ranch (+1,T1,T1,T1),X3 
GOTO  TO 

N3 :Q_Branch ( +1 , T2 , T2 , T2 ) , X3 
GOTO  T1 

TO: Output  0,  and  GOTO  NO 
Tl: Output  1,  and  GOTO  NO 
T2: Output  2 ,  and  GOTO  NO 
T3: Output  3 ,  and  GOTO  NO 

Note  that,  the  above  program  has  four  unconditional  GOTO 
statements  that  are  not  part  of  output  statements.  However, 
when  the  code  is  generated  in  the  depth-first  order,  it  has  no 
unconditional  GOTO  statements  that  are  not  part  of  output 
statements. : 

/**  Code  without  Unconditional  GOTO  **/ 
NO : Q_Branch (+1, Nl, Nl, Nl ) , XI 
Q_Branch ( +1 , N3 , N3 , N3 ) , X2 
Q_Branch  (  +  1, Tl, Tl, Tl ) , X3 
TO: Output  0,  and  GOTO  NO 
Nl : Q_Branch ( +1 , T3 , T3 , T3 ) , X2 
N3 :Q_Branch ( +1 , T2 , T2 , T2 ) , X3 
Tl: Output  1,  and  GOTO  NO 
T2: Output  2 ,  and  GOTO  NO 
T3: Output  3 ,  and  GOTO  NO 

Note  that  the  first  four  instructions  correspond  to  the  left¬ 
most  path  from  the  root  node  to  the  terminal  node  TO. 
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The  next  three  instructions  correspond  to  the  path  from  the 
the  node  Nl,  the  node  N3,  and  to  the  terminal  node  Tl. 

( End  of  Example) 

The  code  optimization  problem  for  a  3 -address  QDD  ma¬ 
chine  can  be  reduced  to  a  graph  covering  problem  as  fol¬ 
lows: 

Definition  4.2  A  path  cover  of  a  QDD  is  a  set  of  paths  such 
that  every  node  in  the  QDD  belongs  to  exactly  one  path.  A 
minimal  path  cover  is  a  path  cover  with  the  fewest  paths. 
A  path  in  a  QDD  can  consist  of  just  one  node. 

Theorem  4.1  An  optimal  code  for  a  3-address  QDD  ma¬ 
chine  corresponds  to  a  minimal  disjoint  path  cover  of  the 
QDD. 

(Proof)  A  path  in  a  QDD  corresponds  to  a  sequence  of 
Q_Branch  instructions  followed  by  an  output  instruction. 
A  sequence  of  Q_Branch  instructions  without  an  output  in¬ 
struction  requires  an  unconditional  GOTO  statement.  By 
Lemma  4.2,  minimization  of  the  number  of  unconditional 
GOTO  statements  minimizes  the  code  size.  (Q.E.D.) 

5  Experiment  and  Observation 

5.1  Benchmark  Results 

To  see  the  effectiveness  of  QDDs  over  BDDs,  and  the 
effectiveness  of  the  code  optimization,  we  realized  certain 
benchmark  functions  by  BDDs  and  QDDs.  First, we  com¬ 
pare  QDDs  and  BDDs  with  respect  to  the  number  of  nodes. 
Then,  we  convert  these  into  code  for  BDD  and  QDD  ma¬ 
chines,  and  compare  QDD’s  and  BDD’s  with  respect  to  the 
number  of  instructions. 

Table  5.1  shows  the  experimental  results.  Func.  name 
denotes  the  name  of  the  benchmark  functions;  #  Inp.  de¬ 
notes  the  number  of  input  variables;  #  Out.  denotes  the  num¬ 
ber  of  outputs;  BDD  Nodes  denotes  the  number  of  nodes 
of  the  MTBDD  including  both  terminal  and  non-terminal 
nodes;  Opt.  Codes  under  BDD  denotes  the  number  of  in¬ 
structions  of  the  optimized  code  for  the  1 -address  BDD 
machine  (near  optimal  solution);  Term.  Nodes  denotes  the 
number  of  terminal  nodes;  Aver.  Inst,  under  BDD  denotes 
the  average  number  of  instructions  to  evaluate  an  input  vec¬ 
tor  by  a  1 -address  BDD  machine;  QDD  Nodes  denotes  the 
number  of  nodes  of  the  MTQDD  including  both  terminal 
and  non- terminal  nodes,  that  is  the  same  as  the  number  of 
instructions  for  a  4-address  QDD  machine;  X=00  Codes  un¬ 
der  QDD  denotes  the  number  of  instructions  in  the  code 
for  3-address  QDD  machine,  when  only  the  first  type  of 
instruction  in  Fig.  3.8  is  used;  Opt.  Codes  under  QDD  de¬ 
notes  the  number  of  instructions  of  the  optimized  code  for 
the  3 -address  QDD  machine,  when  all  four  types  of  instruc¬ 
tions  in  Fig.  3.8  are  used  to  minimize  the  number  of  GOTO 


statements;  X  =  00  GOTO  denotes  the  number  of  GOTO 
statements,  when  only  one  type  of  branching  instruction  is 
used;  Opt.  GOTO=( Opt.  Codes  -QDD.  Nodes)  under  QDD 
denotes  the  number  of  GOTO  statements,  when  four  types 
branching  instructions  are  used;  Aver.  Inst,  in  QDD  denotes 
the  average  number  of  instructions  to  evaluate  an  input  vec¬ 
tor  by  a  3-address  QDD  machine;  and  Ratio  denotes  the 
value:  (Aver.  Inst,  in  1 -address  BDD  machine)/(Aver.  Inst, 
in  3-address  QDD  machine). 

5.2  Detail  of  the  Experiment 

Optimization  of  Decision  Diagrams:  First,  the  order¬ 
ing  that  minimizes  the  size  of  the  MTBDD  is  obtained. 
Then,  the  input  variables  are  partitioned  into  groups  of  two 
variables  in  the  natural  order  to  obtain  the  MTQDDs. 
Optimization  of  Codes:  Theorem  4.1  shows  how  to  mini¬ 
mize  the  number  of  GOTO  statements.  The  algorithm  given 
by  [1 1]  is  only  applicable  to  the  program  with  nodes  whose 
in- degrees  and  out- degrees  are  both  two.  So,  we  developed 
our  own  algorithm  to  obtain  near  optimal  solutions  for  our 
more  general  case. 

5.3  Observations 

From  the  table,  we  can  observe  the  following: 

•  The  number  of  nodes  in  QDDs  is  smaller  than  that  of 
BDDs. 

•  The  number  of  instructions  for  the  3 -address  QDD  ma¬ 
chine  can  be  considerably  reduced  by  an  optimization 
algorithm. 

•  For  C432,  in3,  misex2 ,  misj ,  and  rise ,  the  number  of 
GOTO  statements  in  the  optimized  QDD  codes  is  zero. 
This  means  that  optimal  code  is  generated  for  these 
functions.  Also,  for  these  functions,  optimal  code  for 
BDD  machines  are  generated. 

•  signet  requires  many  GOTO  statements  in  both  BDD 
and  QDD  machines.  The  number  of  GOTO  statements 
for  a  BDD  machine  is  given  by 

(Opt.  Codes)-(BDD  Nodes)=867 1-7347=1324. 

•  Opt.  Codes ,  the  number  of  instructions  for  a  3-address 
QDD  machines  is  often  larger  than  QDD  Nodes ,  the 
number  of  instructions  for  a  4- address  QDD  machine. 
The  column  headed  by  Opt.  GOTO  (=OPT.  Codes  - 
QDD.  Nodes)  shows  the  extra  GOTOs.  Except  for  a 
few  functions,  the  extra  GOTOs  are  rather  small. 

•  Consider  the  value:  (Sum  of  X=00  Codes)-(Sum  of 
Optimal  Codes)=28535-24528=4007.  This  shows  the 
total  number  of  instructions  reduced  by  using  four 
types  of  branch  instructions,  instead  of  using  only 
one  type  of  branching  instructions.  However,  to  spec¬ 
ify  four  types  of  instructions,  we  need  two  additional 
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Table  5.1.  Number  of  Nodes  and  Code  Sizes  for  BDD  Machine  and  QDD  Machine. 


BDD 

QDD 

Func. 

# 

# 

BDD 

Opt. 

Term. 

Aver. 

QDD 

X=00 

Opt. 

x=oo 

Opt. 

Aver. 

Ratio 

Name 

Inp. 

Out. 

Nodes 

Codes 

Nodes 

Inst. 

Nodes 

Codes 

Codes 

GOTO 

GOTO 

Inst. 

C432 

36 

7 

1779 

1779 

128 

19.10 

1027 

1408 

1027 

381 

0 

12.73 

1.50 

amd 

14 

24 

206 

206 

84 

5.63 

164 

171 

164 

7 

0 

3.47 

1.62 

apex2 

39 

3 

335 

363 

8 

6.66 

231 

332 

265 

101 

34 

4.99 

1.33 

apex4 

9 

19 

749 

750 

319 

8.24 

600 

639 

601 

39 

1 

4.61 

1.79 

chkn 

29 

7 

220 

241 

28 

7.01 

157 

215 

172 

58 

15 

5.16 

1.36 

duke2 

22 

29 

636 

637 

255 

6.36 

546 

594 

547 

48 

1 

4.09 

1.55 

gary 

15 

11 

228 

232 

70 

5.51 

173 

191 

174 

18 

1 

3.42 

1.61 

inO 

15 

11 

195 

200 

52 

5.02 

145 

170 

148 

25 

3 

2.92 

1.72 

ini 

16 

17 

284 

299 

55 

6.85 

217 

288 

229 

71 

12 

4.70 

1.46 

in2 

19 

10 

291 

296 

73 

3.98 

219 

262 

225 

43 

6 

2.60 

1.53 

in3 

35 

29 

259 

259 

72 

6.63 

214 

234 

214 

20 

0 

4.77 

1.39 

in4 

32 

20 

607 

611 

178 

4.69 

491 

569 

495 

78 

4 

3.44 

1.36 

in5 

24 

14 

461 

466 

134 

8.54 

369 

452 

371 

83 

2 

6.57 

1.30 

in6 

33 

23 

4325 

4338 

1638 

7.51 

3546 

3815 

3555 

269 

9 

5.88 

1.28 

in7 

26 

10 

300 

301 

112 

7.58 

256 

275 

256 

19 

0 

5.84 

1.30 

ml81 

15 

9 

222 

222 

84 

6.80 

196 

217 

196 

21 

0 

4.71 

1.44 

misex2 

25 

18 

113 

113 

35 

4.97 

91 

96 

91 

5 

0 

3.60 

1.38 

misex3 

14 

14 

2910 

2975 

1041 

7.55 

1773 

2159 

1773 

386 

0 

4.05 

1.86 

misj 

35 

14 

4656 

4656 

1408 

14.12 

3275 

3828 

3275 

553 

0 

9.57 

1.47 

mlp6 

12 

12 

5270 

6062 

1238 

12.10 

2582 

2966 

2694 

384 

112 

5.98 

2.02 

rise 

8 

31 

56 

56 

28 

4.42 

44 

44 

44 

0 

0 

2.55 

1.74 

signet 

39 

8 

7347 

8652 

128 

18.23 

5671 

8374 

6907 

2703 

1236 

13.31 

1.37 

tial 

14 

8 

697 

790 

49 

12.05 

388 

552 

466 

164 

78 

6.37 

1.89 

vg2 

25 

8 

131 

135 

24 

7.65 

89 

110 

91 

21 

2 

5.62 

1.36 

xldn 

27 

6 

200 

218 

18 

9.55 

126 

171 

141 

45 

15 

5.74 

1.66 

x6dn 

39 

5 

214 

231 

28 

4.14 

159 

215 

177 

56 

18 

2.74 

1.52 

x9dn 

27 

7 

204 

222 

22 

9.30 

140 

188 

157 

48 

17 

5.80 

1.60 

bits  in  the  instruction  field.  Let  w  be  the  number  of 
bits  in  a  word  in  the  3 -address  QDD  machine,  where 
only  one  type  of  branching  instruction  is  used.  Then, 
the  merit  of  using  four  types  of  instructions  is  accu¬ 
rately  expressed  as:  (Sum  of  X=00  Codes)  xu;-(Sum 
of  Opt.  Codes)  x  (w  +  2)=  28535u;  —  24528  (w  +  2)  = 
4007w  —  49056.  Note  that,  in  most  cases,  w  >  20,  so 
we  can  conclude  that  the  use  of  four  types  of  Q_Branch 
instructions  reduces  the  total  number  of  bits. 

•  The  last  column  of  the  table  shows  that  the  3 -address 
QDD  machine  is  1.28  —  2.02  times  faster  than  the  1- 
address  BDD  machine.  Note  that  for  MLP6 ,  the  ratio 
is  greater  than  2. 

6  Conclusions  and  Comments 

In  this  paper,  we  considered  a  branching  program  ma¬ 
chine  to  evaluate  multiple-output  logic  functions.  To  in¬ 
crease  the  speed  of  evaluation,  we  used  QDDs  instead  of 
BDDs.  To  reduce  the  memory  size,  we  used  3-address  QDD 
machines  instead  of  4-address  QDD  machines.  We  pro¬ 


posed  the  use  of  four  types  of  branch  instructions.  Also, 
we  considered  a  method  to  optimize  codes  for  3 -address 
QDDs.  This  is  different  from  existing  methods  to  optimize 
the  decision  diagrams.  For  various  benchmark  functions, 
we  optimized  the  codes,  and  showed  the  effectiveness  of 
the  approach. 

To  show  the  usefulness  of  QDD  machines,  we  have  de¬ 
veloped  a  parallel  branching  program  machine  (PBM128) 
consists  of  128  QDD  machines  and  a  programmable  in¬ 
terconnection  on  the  Altera’ s  Stratix  II  FPGA.  We  real¬ 
ized  many  benchmark  functions  on  PBM128,  and  com¬ 
pared  its  memory  size  and  computation  time  with  the  In¬ 
tel’s  Core2Duo  microprocessor.  PBM128  requires  approx¬ 
imately  a  quarter  of  the  memory  for  the  Core2Duo,  and 
is  21.4-96.1  times  faster  than  the  Core2Duo.  Details  are 
shown  in  [18]. 
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